CDF is like the “API” that allows you to access all of the information about the distribution (pdf/pmf is derived from the CDF)
Example: we know there’s some “thing” called the Exponential Distribution…
How do we use this distribution to understand a random variable \(X \sim \text{Exp}\)?
Answer: the CDF of \(X\)!
Since all exponentially-distributed RVs have the same PDF, we can call this PDF “the” exponential distribution
Say we want to find the median of \(X\): The median is the number(s) \(m\) satisfying
\[
\Pr(X \leq m) = \frac{1}{2}
\]
Finding a Median via the CDF
Median of a Random Variable \(X\)
The median of a random variable \(X\) with some CDF \(F_X(v_X)\) is the [set of] numbers \(m\) for which the probability that \(X\) is lower than\(m\) is \(\frac{1}{2}\):
(In case you’re wondering why we start with the median rather than the more commonly-used mean: it’s specifically because I want you to get used to calculating general functions \(f(X)\) of a random variable \(X\). It’s easy to just e.g. learn how to compute the mean \(\expect{X}\) and forget that this is only one of many possible choices for \(f(X)\).)
Median via CDF Example
Example: If \(X \sim \text{Exp}(\param{\lambda})\),
Answer: Has no meaning outside of its context: a random variable with a CDF giving the distribution of its possible values
Top Secret Fun Fact
Every Discrete Distribution is [technically, in a weird way] a Continuous Distribution!
Same intuition as why every natural number is a real number, but converse is not true
Marble example: Let \(X\) be an RV defined on this space, so that \(X(A) = 1\), \(X(B) = 2\), \(X(C) = 3\), \(X(D) = 4\). Then the pmf for \(X\) is \(p_X(i) = \frac{1}{4}\) for \(i \in \{1, 2, 3, 4\}\).
We can then use the Dirac delta function\(\delta(v)\) to define a continuous pdf
k <-seq(0, 10)prob <-dbinom(k, 10, 0.5)bar_data <-tibble(k, prob)ggplot(bar_data, aes(x=k, y=prob)) +geom_bar(stat="identity", fill=cbPalette[1]) +labs(title="Binomial Distribution, N = 10, p = 0.5",y="Probability Mass" ) +scale_x_continuous(breaks=seq(0,10)) +dsan_theme("half")
The Emergence of Order
Who can guess the state of this process after 10 steps, with 1 person?
10 people? 50? 100? (If they find themselves on the same spot, they stand on each other’s heads)
100 steps? 1000?
The Result: 16 Steps
Code
library(tibble)library(ggplot2)library(ggExtra)library(dplyr)library(tidyr)# From McElreath!gen_histo <-function(reps, num_steps) { support <-c(-1,1) pos <-replicate(reps, sum(sample(support,num_steps,replace=TRUE,prob=c(0.5,0.5))))#print(mean(pos))#print(var(pos)) pos_df <-tibble(x=pos) clt_distr <-function(x) dnorm(x, 0, sqrt(num_steps)) plot <-ggplot(pos_df, aes(x=x)) +geom_histogram(aes(y =after_stat(density)), fill=cbPalette[1], binwidth =2) +stat_function(fun = clt_distr) +dsan_theme("quarter") +theme(title=element_text(size=16)) +labs(title=paste0(reps," Random Walks, ",num_steps," Steps") )return(plot)}gen_walkplot <-function(num_people, num_steps, opacity=0.15) { support <-c(-1, 1)# Unique id for each person pid <-seq(1, num_people) pid_tib <-tibble(pid) pos_df <-tibble() end_df <-tibble() all_steps <-t(replicate(num_people, sample(support, num_steps, replace =TRUE, prob =c(0.5, 0.5)))) csums <-t(apply(all_steps, 1, cumsum)) csums <-cbind(0, csums)# Last col is the ending positions ending_pos <- csums[, dim(csums)[2]] end_tib <-tibble(pid =seq(1, num_people), endpos = ending_pos, x = num_steps)# Now convert to tibble ctib <-as_tibble(csums, name_repair ="none") merged_tib <-bind_cols(pid_tib, ctib) long_tib <- merged_tib %>%pivot_longer(!pid)# Convert name -> step_num long_tib <- long_tib %>%mutate(step_num =strtoi(gsub("V", "", name)) -1)# print(end_df) grid_color <-rgb(0, 0, 0, 0.1)# And plot! walkplot <-ggplot( long_tib,aes(x = step_num,y = value,group = pid,# color=factor(label) ) ) +geom_line(linewidth = g_linesize, alpha = opacity, color = cbPalette[1]) +geom_point(data = end_tib, aes(x = x, y = endpos), alpha =0) +scale_x_continuous(breaks =seq(0, num_steps, num_steps /4)) +scale_y_continuous(breaks =seq(-20, 20, 10)) +dsan_theme("quarter") +theme(legend.position ="none",title =element_text(size =16) ) +theme(panel.grid.major.y =element_line(color = grid_color, linewidth =1, linetype =1) ) +labs(title =paste0(num_people, " Random Walks, ", num_steps, " Steps"),x ="Number of Steps",y ="Position" )}wp1 <-gen_walkplot(500, 16, 0.05)
Warning: The `x` argument of `as_tibble.matrix()` must have unique column names if
`.name_repair` is omitted as of tibble 2.0.0.
ℹ Using compatibility `.name_repair`.
Code
ggMarginal(wp1, margins ="y", type ="histogram", yparams =list(binwidth =1))
The important part (imo): this is the most conservative out of all possible (symmetric) prior distributions defined on \(\mathbb{R}\) (defined from \(-\infty\) to \(\infty\))
“Most Conservative” How?
Of all possible distributions with mean \(\mu\), variance \(\sigma^2\), \(\mathcal{N}(\mu, \sigma^2)\) is the entropy-maximizing distribution
Roughly: using any other distribution (implicitly/secretly) imports additional information beyond the fact that mean is \(\mu\) and variance is \(\sigma^2\)
Example: let \(X\) be an RV. If we know mean is \(\mu\), variance is \(\sigma^2\), but then we learn that \(X \neq 3\), or \(X\) is even, or the 15th digit of \(X\) is 7, can update to derive a “better” distribution (incorporating this info)
The Takeaway
Given info we know, we can find a distribution that “encodes” only this info
More straightforward example: if we only know that the value is something in the range \([a,b]\), entropy-maximizing distribution is the Uniform Distribution
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
Warning in stat_function(fun = my_dexp, size = g_linesize, fill = cbPalette[1],
: Ignoring unknown parameters: `fill`
The Dreaded Cauchy Distribution
Paxton is a Denver Nuggets fan, while Jeff is a Washington Wizards fan. Paxton creates an RV \(D\) modeling how many games above .500 the Nuggets will be in a given season, while Jeff creates an RV \(W\) modeling how many games above .500 the Wizards will be.
They decide to combine their RVs to create a new RV, \(R = \frac{D}{W}\), which now models how much better the Nuggets will be in a season (\(R\) for “Ratio”)
For example, if the Nuggets are \(10\) games above .500, while the Wizards are only \(5\) above .500, \(R = \frac{10}{5} = 2\). If they’re both 3 games above .500, \(R = \frac{3}{3} = 1\).
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
Warning in stat_function(fun = dcauchy, size = g_linesize, fill = cbPalette[1],
: Ignoring unknown parameters: `fill`
So What’s the Issue?
So far so good. It turns out (though Paxton and Jeff don’t know this) that the teams are actually both mediocre, so that \(D \sim N(0,10)\) and \(W \sim N(0,10)\)… What is the distribution of \(R\) in this case?
\[
\begin{gather*}
R \sim \text{Cauchy}\left( 0, 1 \right)
\end{gather*}
\]
If a stick is broken at random into three pieces, what is the probability that the pieces can be put back together into a triangle?
This cannot be answered without additional information about the exact method of breaking
One method is to select, independently and at random, two points from the points that range uniformly along the stick, then break the stick at these two points
Suppose, however, that we interpret in a different way the statement “break a stick at random into three pieces”. We break the stick at random, we select randomly one of the two pieces, and we break that piece at random.
Will these two interpretations result in the same probabilities?
It enables conversion of discrete distributions into continuous distributions as it represents an “infinite point mass” at \(0\) that can be integrated1:
\[
\delta(v) = \begin{cases}\infty & v = 0 \\ 0 & v \neq 0\end{cases}
\]
Its integral also has a name: integrating over \(v \in (-\infty, \infty)\) produces the Heaviside step function\(\theta(v)\):
\[
\int_{-\infty}^{\infty}\delta(v)dv = \theta(v) = \begin{cases} 1 & v = 0 \\ 0 & v \neq 0\end{cases}
\]
Footnotes
This is leaving out some of the complexities of defining this function so it “works” in this way: for example, we need to use the Lebesgue integral rather than the (standard) Riemann integral for it to be defined at all, and even then it technically fails the conditions necessary for a fully-well-defined Lebesgue integral. For full details see this section from the Wiki article on PDFs, and follow the links therein.↩︎